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ABSTRACT 

Background: The reliability of binary exposure 
classification methods is routinely reported in 
occupational health literature because it is viewed as an 
important component of evaluating the trustworthiness 
of the exposure assessment by experts. The Kappa 
statistics (k) are typically employed to assess how well 
raters or classification systems agree in a variety of 
contexts, such as identifying exposed participants in a 
population-based epidemiological study of risks due to 
occupational exposures. However, the question we are 
really interested in is not so much the reliability of an 
exposure assessment method, although this holds 
value in itself, but the validity of the exposure 
estimates. The validity of binary classifiers can be 
expressed as a method's sensitivity (SN) and 
specificity (SP), estimated from its agreement with the 
error-free classifier. 

Methods and results: We describe a simulation- 
based method for deriving information on SN and 
SP that can be derived from k and the prevalence of 
exposure, since an analytic solution is not possible 
without restrictive assumptions. This work is illustrated 
in the context of comparison of job-exposure matrices 
assessing occupational exposures to polycyclic 
aromatic hydrocarbons. 

Discussion: Our approach allows the investigators to 
evaluate how good their exposure-assessment methods 
truly are, not just how well they agree with each other, 
and should lead to incorporation of information of 
validity of expert assessment methods into formal 
uncertainty analyses in epidemiology. 
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INTRODUCTION 

The reliability of binary exposure classification 
methods is routinely reported in occupational 
health literature because it is viewed as an 
important component of evaluating the trust- 
worthiness of the exposure assessment. The 
Kappa statistics (k) are typically employed to 
assess how well the raters or classification 
systems agree in a variety of contexts, such as 
identifying exposed participants in a 



Strengths and limitations of this study 



The main strength of our approach is that it is 
flexible and easy to implement. 
Our methodology accounts for realistic uncertain- 
ties that an epidemiologist faces in evaluating the 
plausible extent of exposure misclassification. 
The main limitation of our work is that it does 
not yet account for correlated errors in exposure 
estimates that are common in the field, and the 
importance of this limitation remains to be 
understood. 



population-based epidemiological study of risks 
due to occupational exposures. Most recently 
Offermans et al 1 estimated agreement among 
various methods of assessing exposures in a 
cohort using various expert-based methods 
(job-exposure matrices and case-by-case evalua- 
tions). The authors reported k coefficients for 
these methods that are not unlike those pre- 
sented previously in a review by Teschke et al, 2 
and that seems to suggest that k values of about 
0.6 or worse are a fair summary of what these 
methods generally yield in terms of inter-rater 
agreement in a typical study of occupational 
exposures. However, the question we are really 
interested in is not so much the reliability of a 
method to assess exposure, although this holds 
value in itself, but the validity of the exposure 
estimates. 

The validity of binary classifiers can be 
expressed as a method's sensitivity (SN) and 
specificity (SP), estimated from its agreement 
with the error-free classifier (also known as 
'gold standard'). 3 But how does one infer 
what k tells us about the validity of exposure 
estimates (ie, SN and SP) when a true value 
(gold standard) is unavailable? Generally, 
reliability contains information on validity, 3 
but in the case of k, its relationship with SN 
and SP is also affected by prevalence of 
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exposure (Pr). An analytic solution in this case is not 
possible without restrictive assumptions about the actual 
prevalence and relationship between SN and SP. 
Therefore, we developed a simulation-based method for 
deriving information on SN and SP based on k and the 
Pr. We illustrate this method in the context of a compari- 
son of job-exposure matrices assessing occupational 
exposures to polycyclic aromatic hydrocarbons (PAHs). 1 

METHOD 

We propose a simulation-based method to calculate the 
values of SN and SP that are consistent with the 
observed k and Pr. The relationship among k, SN, SP 
and Pr can be described mathematically, if we assume 
two conditionally independent raters with the same val- 
idity, by: 

k =(Prx(SP- 1 + SN) 2 ) 

x (Pr - l)/((PrxSN-SP-Pr+PrxSP) (1) 

x (PrxSN+1 -SP-Pr+PrxSP)) 
We assume that exposure classification by experts is 
better than chance, as expressed by: 

SN+SP>1 (2) 
First, we define the distributions of the lower (kj) and 
upper (K h ) bounds of k by using uniform distributions 
(U) as K 1 ^U(a 1 , a 2 ) and K h ~U(b 1? b 2 ). We further 
define the distribution of Pr as a Beta distribution — 
Pr~Beta(c, d). Information required to specify these dis- 
tributions with reasonable credibility is available in 
reports evaluating inter-rater agreements, as in refer- 
ence. 1 We can then calculate (multiple) the lower 
bounds of SN and SP (SNi and SPi) that are consistent 
with these distributions, following: 

SNi = ki/((1-Pt) + kixPt), (3) 

SPi = ki/(Pt+kix(1 - Pr)) (4) 
The upper theoretical bounds on SN and SP are known 
(ie, these are 1) and, even though no other information 
is available, this enables us to sample plausible SN and 
SP values from the uniform distribution constrained by 
the lower bounds (SNi and SPi, respectively) and the 
upper bounds of 1. Using Monte Carlo sampling, this 
procedure is repeated multiple times to generate sets of 
possible (SN and SP) combinations. 

The proposed procedure is a hierarchical process that 
starts with (a) selecting a set of (ki, Pr) values from spe- 
cified distributions to calculate (SNi, SPi; Eqs. (3) and 
(4)), and is followed by (b) selecting candidate set (SN 
and SP) from values uniformly distributed between the 
lower bounds (SNi and SPi) and the upper theoretical 
maximum of 1, and completed by (c) imposing con- 
straints on the candidate set of (SN and SP) that are 
implied by Eqs. (1) and (2) (see next paragraph for 
details of the last step). The purpose of step (a) in the 
procedure is to calculate SNi and SPi. The purpose of 



step (b) is to sample candidate values of SN and SP that 
lie between their respective theoretical lower and upper 
boundaries. The purpose of step (c) is to limit the sets 
of values of SN and SP selected in step (b) to only those 
that, first, are congruent with the theoretical model that 
relates validity to reliability (Eq. 1), and, second, satisfy 
the assumption that classification of exposure is better 
than random (Eq. 2). 

By chance, some values of Pr, SN and SP selected in this 
way will correspond to values of k, implied by Eq. (1), that 
lie outside of bounds on k that we have specified by choos- 
ing specific values of Ki and K h from corresponding distri- 
butions. Furthermore, some combinations of SN and SP 
will not be consistent with Eq. (2) (ie, imply that exposure 
classification was worse than chance). Consequently, the 
candidate sets of values of SN and SP that are not in agree- 
ment with our starting assumptions are eliminated from 
the sample used to estimate the distributions of SN and SP. 
The resulting combinations are consistent with our knowl- 
edge of agreement between different exposure assessment 
methods and foretell how valid these exposure assessment 
methods can be expected to be in general. 

Calculation can be implemented in R, and is available 
in Appendix 1 (available online) with input values spe- 
cific to the illustrative example described below. 

RESULTS 

We apply our method to information provided in table 2 
in the article by Offermans et at for PAH exposure 
assessment. First, we define the distributions of the Ki 
and K h for PAH by using U as kHJ(0.29, 0.31) and 
K h ~U(0.59, 0.61). Some degree of judgements is 
involved in this but our formulation reflects the observa- 
tion that in this case k for PAHs lies between 0.3 and 
0.6. We further define the distribution of Pr (mode of 
5%, with 95% certainty that Pr does not exceed 10%) as 
Pr~Beta(6.2, 99.7). 5 The results of the rest of the calcu- 
lations are summarised in figure 1, derived from 10 000 
Monte Carlo samples for candidate values of SN and SP 
(step (b) above). They reveal that the mean SN for this 
example is about 0.78 (SD 0.15) and mean SP is about 
0.96 (SD 0.03). 

DISCUSSION 

Our approach allows the investigators to evaluate how 
good their exposure-assessment methods truly are, not 
just how well they agree with each other, and should 
lead to incorporation of information of validity of expert 
assessment methods into formal uncertainty analyses in 
epidemiology. 6 Specifically, once we can represent 
knowledge about SN and SP by a joint distribution, we 
can use a number of existing techniques to evaluate the 
impact of exposure misclassification on the epidemio- 
logical results and to correct such results for known 
imperfections in exposure classification. Till now, knowl- 
edge of k and exposure prevalence did not enable such 
analyses. It is noteworthy that Bayesian analyses that 
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Figure 1 Plausible pairs of sensitivity (SN) and specificity 
(SP) values for exposure-assessment methods for polycyclic 
aromatic hydrocarbons evaluated in ref. 1; hashed lines 
denote means. 



appraised SN and SP of another job-exposure matrix 
produced a very similar appraisal for SP and lower value 
for average SN with a similarly wide distribution. 7 8 This 
perhaps points to commonality of quality of expert 
assessment methods used in occupational epidemiology. 
It is important to note that simple comparison of mea- 
sures of agreement across studies and instruments is not 
helpful because values of k depend on the Pr, which 
may differ between applications even for the same SN 
and SP. Our method has a distinct advantage for such 
comparisons and assessment of validity. With knowledge 
about validity even if it is uncertain, we can begin the 
work on incorporating this knowledge into routine epi- 
demiological analyses. 9 
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